Inference and Validation

Now that you have a trained network, you can use it for making predictions. This is typically called inference, a term borrowed from statistics. However, neural networks have a tendency to perform too well on the training data and aren't able to generalize to data that hasn't been seen before. This is called overfitting and it impairs inference performance. To test for overfitting while training, we measure the performance on data not in the training set called the validation set. We avoid overfitting through regularization such as dropout while monitoring the validation performance during training. In this notebook, I'll show you how to do this in PyTorch.

As usual, let's start by loading the dataset through torchvision. You'll learn more about torchvision and loading data in a later part. This time we'll be taking advantage of the test set which you can get by setting train=False here:

testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)

The test set contains images just like the training set. Typically you'll see 10-20% of the original dataset held out for testing and validation with the rest being used for training.


In [1]:
import torch
from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5),
                                                     (0.5, 0.5, 0.5))])
# Download and load the training data
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/',
                                 download=True,
                                 train=True,
                                 transform=transform)
trainloader = torch.utils.data.DataLoader(dataset=trainset,
                                          batch_size=64,
                                          shuffle=True)

# Download and load the test data
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/',
                                download=True,
                                train=False,
                                transform=transform)
testloader = torch.utils.data.DataLoader(dataset=testset,
                                         batch_size=64,
                                         shuffle=True)

Here I'll create a model like normal, using the same one from my solution for part 4.


In [2]:
from torch import nn, optim
import torch.nn.functional as F

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x

The goal of validation is to measure the model's performance on data that isn't part of the training set. Performance here is up to the developer to define though. Typically this is just accuracy, the percentage of classes the network predicted correctly. Other options are precision and recall) and top-5 error rate. We'll focus on accuracy here. First I'll do a forward pass with one batch from the test set.


In [3]:
model = Classifier()

images, labels = next(iter(testloader))
# Get the class probabilities
ps = torch.exp(model(images))
# Make sure the shape is appropriate, we should get 10 class probabilities for 64 examples
print(ps.shape)


torch.Size([64, 10])

With the probabilities, we can get the most likely class using the ps.topk method. This returns the $k$ highest values. Since we just want the most likely class, we can use ps.topk(1). This returns a tuple of the top-$k$ values and the top-$k$ indices. If the highest value is the fifth element, we'll get back 4 as the index.


In [4]:
top_p, top_class = ps.topk(1, dim=1)
# Look at the most likely classes for the first 10 examples
print(top_class[:10,:])


tensor([[ 9],
        [ 9],
        [ 4],
        [ 6],
        [ 6],
        [ 9],
        [ 9],
        [ 6],
        [ 5],
        [ 3]])

Now we can check if the predicted classes match the labels. This is simple to do by equating top_class and labels, but we have to be careful of the shapes. Here top_class is a 2D tensor with shape (64, 1) while labels is 1D with shape (64). To get the equality to work out the way we want, top_class and labels must have the same shape.

If we do

equals = top_class == labels

equals will have shape (64, 64), try it yourself. What it's doing is comparing the one element in each row of top_class with each element in labels which returns 64 True/False boolean values for each row.


In [5]:
equals = top_class == labels.view(*top_class.shape)

Now we need to calculate the percentage of correct predictions. equals has binary values, either 0 or 1. This means that if we just sum up all the values and divide by the number of values, we get the percentage of correct predictions. This is the same operation as taking the mean, so we can get the accuracy with a call to torch.mean. If only it was that simple. If you try torch.mean(equals), you'll get an error

RuntimeError: mean is not implemented for type torch.ByteTensor

This happens because equals has type torch.ByteTensor but torch.mean isn't implement for tensors with that type. So we'll need to convert equals to a float tensor. Note that when we take torch.mean it returns a scalar tensor, to get the actual value as a float we'll need to do accuracy.item().


In [6]:
accuracy = torch.mean(equals.type(torch.FloatTensor))
print(f'Accuracy: {accuracy.item()*100}%')


Accuracy: 9.375%

The network is untrained so it's making random guesses and we should see an accuracy around 10%. Now let's train our network and include our validation pass so we can measure how well the network is performing on the test set. Since we're not updating our parameters in the validation pass, we can speed up our code by turning off gradients using torch.no_grad():

# turn off gradients
with torch.no_grad():
    # validation pass here
    for images, labels in testloader:
        ...

Exercise: Implement the validation loop below and print out the total accuracy after the loop. You can largely copy and paste the code from above, but I suggest typing it in because writing it out yourself is essential for building the skill. In general you'll always learn more by typing it rather than copy-pasting. You should be able to get an accuracy above 80%.


In [7]:
model = Classifier()
criterion = nn.NLLLoss()
optimizer = optim.Adam(params=model.parameters(),
                       lr=0.003)

epochs = 50
steps = 0

train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        
        optimizer.zero_grad()
        
        log_ps = model(images)
        loss = criterion(log_ps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
    else:
        ## TODO: Implement the validation pass and print out the validation accuracy
        ### Resetting the validation values each epoch
        test_loss = 0
        accuracy = 0
        with torch.no_grad():
            for images, labels in testloader:
                # Calculate the log probabilities
                log_probs = model(images)
                # Add the loss of the batch to the total loss
                test_loss += criterion(log_probs, labels)
                
                ps = torch.exp(log_probs)
                # Choose the predicted class with the highest probability
                top_p, top_class = ps.topk(k=1, dim=1)
                 # Calculate number of correct guesses for the batch
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor)) 
        train_losses.append(running_loss / len(trainloader))
        test_losses.append(test_loss / len(testloader))
        print("Epoch: {}/{}.. ".format(e + 1, epochs),
              "Training Loss: {:.3f}.. ".format(running_loss / len(trainloader)),
              "Test Loss: {:.3f}.. ".format(test_loss / len(testloader)),
              "Test Accuracy: {:.3f}".format(accuracy / len(testloader)))


Epoch: 1/50..  Training Loss: 0.509..  Test Loss: 0.426..  Test Accuracy: 0.850
Epoch: 2/50..  Training Loss: 0.391..  Test Loss: 0.405..  Test Accuracy: 0.854
Epoch: 3/50..  Training Loss: 0.356..  Test Loss: 0.368..  Test Accuracy: 0.864
Epoch: 4/50..  Training Loss: 0.331..  Test Loss: 0.396..  Test Accuracy: 0.858
Epoch: 5/50..  Training Loss: 0.310..  Test Loss: 0.375..  Test Accuracy: 0.868
Epoch: 6/50..  Training Loss: 0.304..  Test Loss: 0.387..  Test Accuracy: 0.867
Epoch: 7/50..  Training Loss: 0.287..  Test Loss: 0.368..  Test Accuracy: 0.876
Epoch: 8/50..  Training Loss: 0.281..  Test Loss: 0.391..  Test Accuracy: 0.867
Epoch: 9/50..  Training Loss: 0.271..  Test Loss: 0.359..  Test Accuracy: 0.875
Epoch: 10/50..  Training Loss: 0.265..  Test Loss: 0.374..  Test Accuracy: 0.869
Epoch: 11/50..  Training Loss: 0.256..  Test Loss: 0.360..  Test Accuracy: 0.873
Epoch: 12/50..  Training Loss: 0.252..  Test Loss: 0.363..  Test Accuracy: 0.880
Epoch: 13/50..  Training Loss: 0.250..  Test Loss: 0.388..  Test Accuracy: 0.875
Epoch: 14/50..  Training Loss: 0.240..  Test Loss: 0.423..  Test Accuracy: 0.855
Epoch: 15/50..  Training Loss: 0.235..  Test Loss: 0.378..  Test Accuracy: 0.879
Epoch: 16/50..  Training Loss: 0.229..  Test Loss: 0.401..  Test Accuracy: 0.872
Epoch: 17/50..  Training Loss: 0.224..  Test Loss: 0.382..  Test Accuracy: 0.878
Epoch: 18/50..  Training Loss: 0.220..  Test Loss: 0.397..  Test Accuracy: 0.869
Epoch: 19/50..  Training Loss: 0.217..  Test Loss: 0.372..  Test Accuracy: 0.885
Epoch: 20/50..  Training Loss: 0.214..  Test Loss: 0.365..  Test Accuracy: 0.883
Epoch: 21/50..  Training Loss: 0.209..  Test Loss: 0.386..  Test Accuracy: 0.883
Epoch: 22/50..  Training Loss: 0.206..  Test Loss: 0.389..  Test Accuracy: 0.883
Epoch: 23/50..  Training Loss: 0.206..  Test Loss: 0.398..  Test Accuracy: 0.877
Epoch: 24/50..  Training Loss: 0.202..  Test Loss: 0.390..  Test Accuracy: 0.884
Epoch: 25/50..  Training Loss: 0.198..  Test Loss: 0.422..  Test Accuracy: 0.878
Epoch: 26/50..  Training Loss: 0.188..  Test Loss: 0.420..  Test Accuracy: 0.878
Epoch: 27/50..  Training Loss: 0.188..  Test Loss: 0.440..  Test Accuracy: 0.878
Epoch: 28/50..  Training Loss: 0.189..  Test Loss: 0.428..  Test Accuracy: 0.889
Epoch: 29/50..  Training Loss: 0.180..  Test Loss: 0.426..  Test Accuracy: 0.879
Epoch: 30/50..  Training Loss: 0.182..  Test Loss: 0.408..  Test Accuracy: 0.886
Epoch: 31/50..  Training Loss: 0.182..  Test Loss: 0.411..  Test Accuracy: 0.884
Epoch: 32/50..  Training Loss: 0.179..  Test Loss: 0.413..  Test Accuracy: 0.883
Epoch: 33/50..  Training Loss: 0.175..  Test Loss: 0.420..  Test Accuracy: 0.884
Epoch: 34/50..  Training Loss: 0.175..  Test Loss: 0.443..  Test Accuracy: 0.883
Epoch: 35/50..  Training Loss: 0.174..  Test Loss: 0.435..  Test Accuracy: 0.877
Epoch: 36/50..  Training Loss: 0.168..  Test Loss: 0.439..  Test Accuracy: 0.880
Epoch: 37/50..  Training Loss: 0.170..  Test Loss: 0.432..  Test Accuracy: 0.888
Epoch: 38/50..  Training Loss: 0.158..  Test Loss: 0.455..  Test Accuracy: 0.883
Epoch: 39/50..  Training Loss: 0.169..  Test Loss: 0.492..  Test Accuracy: 0.881
Epoch: 40/50..  Training Loss: 0.154..  Test Loss: 0.472..  Test Accuracy: 0.885
Epoch: 41/50..  Training Loss: 0.161..  Test Loss: 0.483..  Test Accuracy: 0.874
Epoch: 42/50..  Training Loss: 0.161..  Test Loss: 0.461..  Test Accuracy: 0.889
Epoch: 43/50..  Training Loss: 0.151..  Test Loss: 0.454..  Test Accuracy: 0.883
Epoch: 44/50..  Training Loss: 0.156..  Test Loss: 0.482..  Test Accuracy: 0.882
Epoch: 45/50..  Training Loss: 0.150..  Test Loss: 0.499..  Test Accuracy: 0.878
Epoch: 46/50..  Training Loss: 0.168..  Test Loss: 0.439..  Test Accuracy: 0.884
Epoch: 47/50..  Training Loss: 0.153..  Test Loss: 0.481..  Test Accuracy: 0.885
Epoch: 48/50..  Training Loss: 0.133..  Test Loss: 0.509..  Test Accuracy: 0.885
Epoch: 49/50..  Training Loss: 0.142..  Test Loss: 0.500..  Test Accuracy: 0.886
Epoch: 50/50..  Training Loss: 0.141..  Test Loss: 0.546..  Test Accuracy: 0.882

In [8]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

In [9]:
plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)


Out[9]:
<matplotlib.legend.Legend at 0x1be9e9725f8>

Overfitting

If we look at the training and validation losses as we train the network, we can see a phenomenon known as overfitting.

The network learns the training set better and better, resulting in lower training losses. However, it starts having problems generalizing to data outside the training set leading to the validation loss increasing. The ultimate goal of any deep learning model is to make predictions on new data, so we should strive to get the lowest validation loss possible. One option is to use the version of the model with the lowest validation loss, here the one around 8-10 training epochs. This strategy is called early-stopping. In practice, you'd save the model frequently as you're training then later choose the model with the lowest validation loss.

The most common method to reduce overfitting (outside of early-stopping) is dropout, where we randomly drop input units. This forces the network to share information between weights, increasing it's ability to generalize to new data. Adding dropout in PyTorch is straightforward using the nn.Dropout module.

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)

        # Dropout module with 0.2 drop probability
        self.dropout = nn.Dropout(p=0.2)

    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)

        # Now with dropout
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.dropout(F.relu(self.fc3(x)))

        # output so no dropout here
        x = F.log_softmax(self.fc4(x), dim=1)

        return x

During training we want to use dropout to prevent overfitting, but during inference we want to use the entire network. So, we need to turn off dropout during validation, testing, and whenever we're using the network to make predictions. To do this, you use model.eval(). This sets the model to evaluation mode where the dropout probability is 0. You can turn dropout back on by setting the model to train mode with model.train(). In general, the pattern for the validation loop will look like this, where you turn off gradients, set the model to evaluation mode, calculate the validation loss and metric, then set the model back to train mode.

# turn off gradients
with torch.no_grad():

    # set model to evaluation mode
    model.eval()

    # validation pass here
    for images, labels in testloader:
        ...

# set model back to train mode
model.train()

Exercise: Add dropout to your model and train it on Fashion-MNIST again. See if you can get a lower validation loss or higher accuracy.


In [10]:
## TODO: Define your model with dropout added
class MyModelWithDropout(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
        
        # Dropout with 0.2 probability of droppinga value
        self.dropout = nn.Dropout(p=0.2)
        
    def forward(self, x):
        # make sure input tensor is flattened
        x = x.view(x.shape[0], -1)
        
       # Now with dropout
        x = self.dropout(F.relu(self.fc1(x)))
        x = self.dropout(F.relu(self.fc2(x)))
        x = self.dropout(F.relu(self.fc3(x)))
        
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x

In [11]:
## TODO: Train your model with dropout, and monitor the training progress with the validation loss and accuracy
model = MyModelWithDropout()
criterion = nn.NLLLoss()
optimizer = optim.Adam(params=model.parameters(),
                       lr=0.003)

epochs = 50
steps = 0

train_losses, test_losses = [], []
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        
        optimizer.zero_grad()
        
        log_ps = model(images)
        loss = criterion(log_ps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
    else:
        ## TODO: Implement the validation pass and print out the validation accuracy
        ### Resetting the validation values each epoch
        test_loss = 0
        accuracy = 0
        with torch.no_grad():
            for images, labels in testloader:
                # Calculate the log probabilities
                log_probs = model(images)
                # Add the loss of the batch to the total loss
                test_loss += criterion(log_probs, labels)
                # Obtaining softmax from log_softmax last layer activation
                ps = torch.exp(log_probs)
                # Choose the predicted class with the highest probability
                top_p, top_class = ps.topk(k=1, dim=1)
                 # Calculate number of correct guesses for the batch
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.type(torch.FloatTensor)) 
        train_losses.append(running_loss / len(trainloader))
        test_losses.append(test_loss / len(testloader))
        print("Epoch: {}/{}.. ".format(e + 1, epochs),
              "Training Loss: {:.3f}.. ".format(running_loss / len(trainloader)),
              "Test Loss: {:.3f}.. ".format(test_loss / len(testloader)),
              "Test Accuracy: {:.3f}".format(accuracy / len(testloader)))


Epoch: 1/50..  Training Loss: 0.608..  Test Loss: 0.507..  Test Accuracy: 0.824
Epoch: 2/50..  Training Loss: 0.477..  Test Loss: 0.492..  Test Accuracy: 0.821
Epoch: 3/50..  Training Loss: 0.452..  Test Loss: 0.501..  Test Accuracy: 0.825
Epoch: 4/50..  Training Loss: 0.432..  Test Loss: 0.489..  Test Accuracy: 0.838
Epoch: 5/50..  Training Loss: 0.421..  Test Loss: 0.481..  Test Accuracy: 0.834
Epoch: 6/50..  Training Loss: 0.412..  Test Loss: 0.471..  Test Accuracy: 0.840
Epoch: 7/50..  Training Loss: 0.402..  Test Loss: 0.496..  Test Accuracy: 0.840
Epoch: 8/50..  Training Loss: 0.399..  Test Loss: 0.477..  Test Accuracy: 0.838
Epoch: 9/50..  Training Loss: 0.388..  Test Loss: 0.474..  Test Accuracy: 0.840
Epoch: 10/50..  Training Loss: 0.391..  Test Loss: 0.473..  Test Accuracy: 0.843
Epoch: 11/50..  Training Loss: 0.383..  Test Loss: 0.467..  Test Accuracy: 0.845
Epoch: 12/50..  Training Loss: 0.381..  Test Loss: 0.444..  Test Accuracy: 0.854
Epoch: 13/50..  Training Loss: 0.377..  Test Loss: 0.480..  Test Accuracy: 0.847
Epoch: 14/50..  Training Loss: 0.374..  Test Loss: 0.466..  Test Accuracy: 0.842
Epoch: 15/50..  Training Loss: 0.377..  Test Loss: 0.472..  Test Accuracy: 0.850
Epoch: 16/50..  Training Loss: 0.367..  Test Loss: 0.449..  Test Accuracy: 0.852
Epoch: 17/50..  Training Loss: 0.366..  Test Loss: 0.492..  Test Accuracy: 0.840
Epoch: 18/50..  Training Loss: 0.367..  Test Loss: 0.449..  Test Accuracy: 0.857
Epoch: 19/50..  Training Loss: 0.359..  Test Loss: 0.438..  Test Accuracy: 0.857
Epoch: 20/50..  Training Loss: 0.366..  Test Loss: 0.446..  Test Accuracy: 0.855
Epoch: 21/50..  Training Loss: 0.353..  Test Loss: 0.453..  Test Accuracy: 0.849
Epoch: 22/50..  Training Loss: 0.352..  Test Loss: 0.453..  Test Accuracy: 0.852
Epoch: 23/50..  Training Loss: 0.353..  Test Loss: 0.446..  Test Accuracy: 0.850
Epoch: 24/50..  Training Loss: 0.355..  Test Loss: 0.476..  Test Accuracy: 0.846
Epoch: 25/50..  Training Loss: 0.346..  Test Loss: 0.487..  Test Accuracy: 0.846
Epoch: 26/50..  Training Loss: 0.343..  Test Loss: 0.424..  Test Accuracy: 0.865
Epoch: 27/50..  Training Loss: 0.353..  Test Loss: 0.479..  Test Accuracy: 0.848
Epoch: 28/50..  Training Loss: 0.347..  Test Loss: 0.463..  Test Accuracy: 0.851
Epoch: 29/50..  Training Loss: 0.341..  Test Loss: 0.464..  Test Accuracy: 0.854
Epoch: 30/50..  Training Loss: 0.341..  Test Loss: 0.459..  Test Accuracy: 0.858
Epoch: 31/50..  Training Loss: 0.335..  Test Loss: 0.502..  Test Accuracy: 0.846
Epoch: 32/50..  Training Loss: 0.338..  Test Loss: 0.483..  Test Accuracy: 0.849
Epoch: 33/50..  Training Loss: 0.339..  Test Loss: 0.450..  Test Accuracy: 0.855
Epoch: 34/50..  Training Loss: 0.334..  Test Loss: 0.520..  Test Accuracy: 0.845
Epoch: 35/50..  Training Loss: 0.344..  Test Loss: 0.477..  Test Accuracy: 0.860
Epoch: 36/50..  Training Loss: 0.333..  Test Loss: 0.480..  Test Accuracy: 0.861
Epoch: 37/50..  Training Loss: 0.327..  Test Loss: 0.481..  Test Accuracy: 0.851
Epoch: 38/50..  Training Loss: 0.335..  Test Loss: 0.447..  Test Accuracy: 0.856
Epoch: 39/50..  Training Loss: 0.341..  Test Loss: 0.460..  Test Accuracy: 0.857
Epoch: 40/50..  Training Loss: 0.329..  Test Loss: 0.442..  Test Accuracy: 0.861
Epoch: 41/50..  Training Loss: 0.335..  Test Loss: 0.455..  Test Accuracy: 0.863
Epoch: 42/50..  Training Loss: 0.330..  Test Loss: 0.453..  Test Accuracy: 0.858
Epoch: 43/50..  Training Loss: 0.322..  Test Loss: 0.495..  Test Accuracy: 0.855
Epoch: 44/50..  Training Loss: 0.326..  Test Loss: 0.456..  Test Accuracy: 0.861
Epoch: 45/50..  Training Loss: 0.325..  Test Loss: 0.467..  Test Accuracy: 0.852
Epoch: 46/50..  Training Loss: 0.344..  Test Loss: 0.478..  Test Accuracy: 0.851
Epoch: 47/50..  Training Loss: 0.334..  Test Loss: 0.465..  Test Accuracy: 0.857
Epoch: 48/50..  Training Loss: 0.328..  Test Loss: 0.471..  Test Accuracy: 0.862
Epoch: 49/50..  Training Loss: 0.322..  Test Loss: 0.488..  Test Accuracy: 0.854
Epoch: 50/50..  Training Loss: 0.329..  Test Loss: 0.473..  Test Accuracy: 0.856

In [12]:
plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)


Out[12]:
<matplotlib.legend.Legend at 0x1be9ed8b5c0>

Inference

Now that the model is trained, we can use it for inference. We've done this before, but now we need to remember to set the model in inference mode with model.eval(). You'll also want to turn off autograd with the torch.no_grad() context.


In [13]:
# Import helper module 
import helper

# Evaluation
model.eval()

dataiter = iter(testloader)
images, labels = dataiter.next()
img = images[0]
# Convert 2D image to 1D vector
img = img.view(1, 784)

# Calculate the class probabilities (softmax) for img
with torch.no_grad():
    output = model.forward(img)

ps = torch.exp(output)

# Plot the image and probabilities
helper.view_classify(img.view(1, 28, 28), ps, version='Fashion')


Next Up!

In the next part, I'll show you how to save your trained models. In general, you won't want to train a model everytime you need it. Instead, you'll train once, save it, then load the model when you want to train more or use if for inference.